SemanticBSDD

Improving the GraphQL, JSON and RDF Representations of buildingSmart Data Dictionary

Vladimir Alexiev, Mihail Radkov, Nataliya Keberle

Objective

  • highlight the defects in the original GraphQL implementation of bSDD
  • overview the refactored solution proposed by Ontotext
  • overview the proposed improvements

bSDD GraphQL Schema: Voyager

bSDD GraphQL Schema: PlantUML

PlantUML is used with soml2puml convertor tool

Original GraphQL: Findings (1/3)

  • reference entities ReferenceDocument, Country, Unit, Language are disconnected from the rest of the schema
  • relation entities have only an incoming link but no outgoing link
  • many entities cannot be queried directly from the Root
  • no backward arrows to get from a lower-level entity back to its “parent” entity
  • a number of parallel arrows. GraphQL schema can use parameters to distinguish between the different uses

Original GraphQL: Findings (2/3)

At the high level of detail:

  • Property and ClassificationProperty are very similar, but there’s no inheritance/relation between them
  • PropertyValue and ClassificationPropertyValue are exactly the same, so can be reduced to one entity

Original GraphQL: Fragment

Original GraphQL: Findings (3/3)

Even more details:

  • mixture of singular/plural in property names

    property/properties, relations, synonyms, countriesOfUse, relatedIfcPropertyNames, etc.

Refactored GraphQL: Improvements

  • all entities are queryable directly from the Root
  • link deduplication
  • each link is named the same as target entity
  • navigation between entities is bidirectional, e.g., Classification hierarchy can be navigated both up and down (parentClassification, childClassification)
  • a query can traverse a Relation entity to get data about the related entity:
    • Classification.relation -> ClassificationRelation.related -> Classification
    • Property.relation -> PropertyRelation.related -> Property
  • a single entity PropertyValue is used by both Property and ClassificationProperty
  • property names are in singular

Refactored GraphQL: Fragment

Graph i QL

Suggested Improvements

Presentation (1/3)

  • uniform identification for the search
  • equal data retrieved from different API
  • improve URL structure and consistency

Uniform Identification for the Search(1/3)

May 2023: IfcCableSegment has another id: https://search.bsdd.buildingsmart.org/Classification/Index/70992

Uniform Identification for the Search(2/3)

IfcCableSegment has also unique URI:

https://identifier.buildingsmart.org/uri/buildingsmart/ifc-4.3/class/IfcCableSegmentCABLESEGMENT

CableSegment entity as displayed at the bSDD web site

Uniform Identification for the Search(3/3)

Non-unique identification violates FAIR Findability principle

F1: (Meta)data are assigned a globally unique and persistent identifier

Equal Data Retrieved from Different API (1/2)

We have compared three representations returned by the bSDD server:

  • JSON from the GraphQL API
    • https://test.bsdd.buildingsmart.org/graphiql/,
  • JSON from the REST (entity) API
    • curl https://identifier.buildingsmart.org/uri/buildingsmart/ <domain>/class|prop/<name> and
  • RDF from the REST (entity) API
    • curl -Haccept:text/turtle \\ https://identifier.buildingsmart.org/uri/buildingsmart/ <domain>/class|prop/<name>

Equal Data Retrieved from Different API (2/2)

We selected entities of each class that have the maximum number of filled fields, and compared the results returned by each API.

The differences are here:

Improve URL Structure and Consistency (1/7)

Recommendations on ontology URI design, including versioning and opaque URIs to maintain evolution and multilingualism inherent to bSDD, are described in Garijo & Poveda-Villalon, 2020.

Almost all bSDD domain URLs now have the same structure: https://identifier.buildingsmart.org/uri/<org>/<domain>-<version>

URIs can be more ``hackable’’, allowing users to navigate the hierarchy by pruning the URI: https://identifier.buildingsmart.org/uri/<org>/<domain>/<version>

Improve URL Structure and Consistency (2/7)

  • In some cases, the <org> is repeated in the <domain> part
  • In some cases, the <org> name doesn’t quite mesh with the domain name, perhaps due to the way bSDD allocates <org> identifiers to bSDD contributors
    • bim-de/DINSPEC91400: the publisher of this spec is DIN (the German standards organization), not the bim-de initiative
    • digibase/volkerwesselsbv: bimregister.nl news from 2018 suggest that digibase is a new company/initaitive within Volker Wessel
    • digibase/nen2699: the publisher of this spec is NEN (the Netherlands standards organization), not the digibase company/initiative
    • digibase/digibasebouwlagen: perhaps the org name digibase should not be repeated as the prefix of the domain bouwlagen (building layers)

Improve URL Structure and Consistency (3/7)

  • Explicate domain versions:

https://identifier.buildingsmart.org/uri/acca/ACCAtest-0.1

can become

https://identifier.buildingsmart.org/uri/acca/ACCAtest/0.1

A new entity DomainVersion can provide linking all versions of a domain to its master Domain entity.

Improve URL Structure and Consistency (4/7)

  • Declare URLs to be ID and use a mandatory field id
    • Most GraphQL implementations call this field simply id, whereas bSDD uses namespaceUri
    • Many nodes do not have their own namespaceUri field, or it is not fully populated

Improve URL Structure and Consistency (5/7)

  • Remove the overlap of Entity Classes with classificationTypes

The key field classificationType specifies the kind of classification.

c classificationType overlaps with entity
29 “DOMAIN” Domain
18 “REFERENCE_DOCUMENT” ReferenceDocument

Examples of unusual classifications:

https://identifier.buildingsmart.org/uri/ATALANE/REX-OBJ-1.0/class/589b06ad-f802-468b-939c-e60436601a7a is a “REFERENCE_DOCUMENT” with name “décret 2011-321 (23/03/2011)”.

Why is it not a ReferenceDocument entity?

Improve URL Stucture and Consistency (6/7)

  • All entities should have URL

All significant classes should have ID, which in the case of RDF data is the node URL.

However, many bSDD classes don’t have such a field:

  • Domain, Property, Classification do have namespaceUri
  • Country, Language, Unit don’t have an ID but have a field (code, isocode) that can be used to make an ID, when prepended with an appropriate prefix.

Improve URL Stucture and Consistency (7/7)

Property and ClassificationProperty are two different classes, but the latter does not have a distinct URL in GraphQL and JSON.

The same URL is overloaded to identify entities of both classes.

ClassificationProperty are thus “second class” entities and are not returned separately by the JSON or RDF entity API, but only as part of the respective Classification

curl https://identifier.buildingsmart.org/uri/buildingsmart/ifc-4.3/class/IfcCableSegmentCABLESEGMENT/ACResistance

{"":["Classification with namespace URI
 'https://identifier.buildingsmart.org/uri/buildingsmart/ifc-4.3/class/IfcCableSegmentCABLESEGMENT/ACResistance'
  not found"]}

Modelling issues

Modelling issues

  • Unify different solutions in the modelling of Complex Properties

Modelling issues

  • Improve modelling of Dynamic Properties

Modelling issues

  • Improve relations between entities

Modelling issues

  • Add more entities

Modelling issues

  • Use class inheritance

Modelling issues

  • Improve representation of PropertyValues

Modelling issues

  • Improve representation of predefinedValue

Modelling issues

  • Improve multilingual support

Data quality

Refactoring

Conclusions and Future Work

Here are further ideas for improvement:

  • improvement of bSDD ontology
  • implement more radical data model refactoring to convert “strings” (countries, reference documents, etc.) into “things”
  • link bSDD units of measure to QUDT ontology
  • perform deeper data quality analysis using SHACL shapes generation and validation provided by Ontotext Platform Semantic Objects
  • address and resolve more data quality issues, including
    • seeking correlation between dimension vectors, units of measure and physical quantity,
    • parsing out enumeration values from Property/ClassificationProperty descriptions and creation of corresponding PropertyValue lists
  • make more graph visualizations
  • obtain more interesting statistics using SPARQL

Acknowledgements

Funding: ACCORD project, Horizon Europe, grant #101056973

Data: buildingSMART Data Dictionary (bSI credits: Leon van Berlo, Artur Tomczak, Erik Baars)

Powered by: